Word Sense Disambiguation with LSTM: Do We Really Need 100 Billion Words?

نویسندگان

  • Minh Le
  • Marten Postma
  • Jacopo Urbani
چکیده

Recently, Yuan et al. (2016) have shown the effectiveness of using Long ShortTerm Memory (LSTM) for performing Word Sense Disambiguation (WSD). Their proposed technique outperformed the previous state-of-the-art with several benchmarks, but neither the training data nor the source code was released. This paper presents the results of a reproduction study of this technique using only openly available datasets (GigaWord, SemCore, OMSTI) and software (TensorFlow). From them, it emerged that stateof-the-art results can be obtained with much less data than hinted by Yuan et al. All code and trained models are made freely available.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word Sense Disambiguation using a Bidirectional LSTM

In this paper we present a model that leverages a bidirectional long short-term memory network to learn word sense disambiguation directly from data. The approach is end-to-end trainable and makes effective use of word order. Further, to improve the robustness of the model we introduce dropword, a regularization technique that randomly removes words from the text. The model is evaluated on two ...

متن کامل

رفع ابهام معنایی واژگان مبهم فارسی با مدل موضوعی LDA

Word sense disambiguation is the task of identifying the correct sense for the word in a given context among a finite set of possible sense. In this paper a model for farsi word sense disambiguation is presented. The model use two group of features: first, all word and stop words around target word and topic models as second features. We extract topics from a farsi corpus with Latent Dirichlet ...

متن کامل

Word Sense Disambiguation Using Skip-Gram and LSTM Models

In this paper, we explore supervised word sense disambiguation (WSD) through two main techniques. Central to both techniques is a learned word vector embedding that has separate word vectors for each sense of a word. When predicting on an unlabeled word, we can consider a window of word that contains it, and iterate over all possible combinations of word senses within this window. Out of all th...

متن کامل

One Single Deep Bidirectional LSTM Network for Word Sense Disambiguation of Text Data

Due to recent technical and scientific advances, we have a wealth of information hidden in unstructured text data such as offline/online narratives, research articles, and clinical reports. To mine these data properly, attributable to their innate ambiguity, a Word Sense Disambiguation (WSD) algorithm can avoid numbers of difficulties in Natural Language Processing (NLP) pipeline. However, cons...

متن کامل

Word embeddings and recurrent neural networks based on Long-Short Term Memory nodes in supervised biomedical word sense disambiguation

Word sense disambiguation helps identifying the proper sense of ambiguous words in text. With large terminologies such as the UMLS Metathesaurus ambiguities appear and highly effective disambiguation methods are required. Supervised learning algorithm methods are used as one of the approaches to perform disambiguation. Features extracted from the context of an ambiguous word are used to identif...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1712.03376  شماره 

صفحات  -

تاریخ انتشار 2017